Principal Methods

11

scaling, showing only minibatch centering is required. Their work provides valuable infor-

mation for research on the BNN training process. The experiments of Alizadeh et al. [2]

show that most of the tricks commonly used in training binary models, such as gradient

and weight clipping, are only required during the final stages of training to achieve the best

performance.

XNOR-Net++ [26] provides a new training algorithm for 1-bit CNNs based on XNOR-

Net. Compared to XNOR-Net, this new method combines activation and weight scaling

factors into a single scalar learned discriminatively through backpropagation. They also try

different ways to construct the shape of the scale factors on the premise that the computa-

tional budget remains fixed.

Borrowing an idea from the Alternating Direction Method of Multipliers (ADMM),

Leng et al. [128] decouple the continuous parameters from the discrete constraints of the

network and divide the original hard problem into several subproblems. These subproblems

are solved by extra gradient and iterative quantization algorithms, leading to considerably

faster convergence than conventional optimization methods.

Deterministic Binary Filters (DBFs) [225] learn weighted coefficients of predefined or-

thogonal binary bases instead of the conventional approach, which directly learns the con-

volutional filters. The filters are generated as a linear combination of orthogonal binary

codes and thus can be generated very efficiently in real time.

BWNH [91] trains binary weight networks by hashing. They first reveal the strong

connection between inner-product preserving hashing and binary weight networks, showing

that training binary weight networks can be intrinsically regarded as a hashing problem.

They propose an alternating optimization method to learn the hash codes instead of directly

learning binary weights.

CI-BCNN [239] learns BNNs with channel-wise interactions for efficient inference. Un-

like existing methods that directly apply XNOR and BITCOUNT operations, this method

learns interacted bitcount according to the mined channel-wise interactions. The incon-

sistent signs in binary feature maps are corrected based on prior knowledge provided by

channel-wise interactions so that the information of the input images is preserved in the

forward propagation of BNNs. Specifically, they employ a reinforcement learning model to

learn a directed acyclic graph for each convolutional layer, representing implicit channel-wise

interactions. They obtain the interacted bitcount by adjusting the output of the original

bitcount in line with the effects exerted by the graph. They train the BCNN and the graph

structure simultaneously.

BinaryRelax [272] is a two-phase algorithm to train CNNs with quantized weights, in-

cluding binary weights. They relax the hard constraint into a continuous regularizer via

Moreau envelope [176], the squared Euclidean distance to the set of quantized weights.

They gradually increase the regularization parameter to close the gap between the weights

and the quantized state. In the second phase, they introduce the exact quantization scheme

with a small learning rate to guarantee fully quantized weights.

CBCNs [149] propose new circulant filters (CiFs) and a circulant binary convolution

(CBConv) to enhance the capacity of binarized convolutional features through circulant

backpropagation. A CiF is a 4D tensor of size K × K × H × H, generated based on a

learned filter and a circulant transfer matrix M. The matrix M here rotates the learned

filter at different angles. The original 2D H×H learned filter is modified to 3D by replicating

it three times and concatenating them to obtain 4D CiF, as shown in Fig. 1.7. The method

can improve the representation capacity of BNNs without changing the model size.

Rectified binary convolutional networks (RBCNs) [148] use a generative adversarial net-

work (GAN) to train the 1-bit binary network with the guidance of its corresponding full-

precision model, which significantly improves the performance of 1-bit CNNs. The rectified

convolutional layers are generic and flexible and can be easily incorporated into existing

DCNNs such as WideResNets and ResNets.